Search CORE

845 research outputs found

Alinhamento de corpora paralelos

Author: Simões Alberto
Publication venue
Publication date: 01/06/2003
Field of study

Este documento apresenta um conjunto de ferramentas denominado NATools para o alinhamento de corpora paralelos. É apresentado o processo de alinhamento tendo em conta os vários níveis intervenientes, desde o convencional alinhamento à frase, até ao alinhamento à palavra, com a criação dos respectivos dicionários de tradução. São apresentadas medidas em relação ao tempo usado para o alinhamento, bem como resultados obtidos. São discutidas técnicas para a detecção de traduções de termos multi-palavra usando o algoritmo de alinhamento à palavra. Os dicionários de tradução obtidos irão ser explicados e as suas aplicações exploradas: navegação e consulta web dos dicionários produzidos e corpora usado; alinhamento ao segmento de palavra (ou tradução "por exemplo"); classificação automática da qualidade de um par de traduções

Universidade do Minho: RepositoriUM

Repositório Comum

Cooking flex with Perl

Author: Simões Alberto
Publication venue
Publication date: 01/05/2002
Field of study

There are a lot of tools for parser generation using Perl. As we know, Perl has flexible data structures which makes it easy to generate generic trees. While it is easy to write a grammar and a lexical analyzer using modules like Parse::Yapp and Parse::Lex, this pair of tools is not as efficient as I would like. In this document I'll present a way to cook quickly Parse::Yapp with the better lexical analyzer I know: flex

Universidade do Minho: RepositoriUM

Segmentação bilingue com base na marker hypothesis

Author: Simões Alberto
Publication venue: Associação Portuguesa para a Inteligência Artificial (APPIA)
Publication date: 01/12/2007
Field of study

A existência de exemplos de tradução é imprescindível para tradução assistida por computador bem como para tradução automática baseada em dados (EBMT e SMT). No entanto, o uso de unidades de tradução de corpora paralelos directamente na tradução não é eficaz já que estas unidades são demasiado grandes, e portanto, torna-se pouco provável que uma mesma unidade de tradução tenha de ser traduzida mais do que uma vez. Para colmatar este problema há necessidade de explorar outras metodologias para a divisão de unidades de tradução em segmentos paralelos mais pequenos. Uma das abordagens que tem vindo a ser utilizada é a segmentação baseada em marcadores (Marker Hypothesis). Este documento pretende documentar as experiências realizadas na utilização deste método para a segmentação de texto português (paralelo com o inglês)

Universidade do Minho: RepositoriUM

Repositório Comum

Computer science, linguists and languages

Author: Simões Alberto
Publication venue: Universidade do Minho
Publication date: 01/01/2014
Field of study

[Excerto] Prólogo: Existe uma espécie de Guerra Santa há algum tempo entre investigadores da área das ciências da computação (a que vou chamar abusivamente de informáticos) e investigadores da área das ciências da língua (a que vou chamar abusivamente de linguistas) porque os primeiros se têm aventurado em tarefas que habitualmente eram realizadas pelos segundos. Estas incursões levam a que tarefas que habitualmente demoram meses a realizar de forma manual sejam automatizadas e realizadas rapidamente, com a ajuda de um programa computacional. Tipicamente, quando estes trabalhos são apresentados em conferências habitualmente frequentadas por linguistas, são alvo de grandes críticas pela falta de correção do resultado obtido. O que pretendo apresentar neste documento são as razões que me parecem levar a este comportamento, e discutir o que é possível alcançar se informáticos e linguistas conseguirem perceber os pontos de vista e objectivos de cada um deles. [...

Universidade do Minho: RepositoriUM

XML parsing in javascript

Author: Simões Alberto
Publication venue: Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH
Publication date: 01/01/2017
Field of study

With Web 2.0 the dynamic web got to a reality. With it, some new concepts arived, like the use of asynchronous calls to receive missing data to render a website, instead of requesting a full new page to the server. For this task, and in the recent years, developers use mostly the JSON format for the interchange of data, than XML. Nevertheless, XML is more suitable for some kind of data interchange but, and even if the web is based in SGML/XML standards, processing XML using directly JavaScript is tricky. In this document, a set of di erent approaches to parse XML with JavaScript will be described, and a new module, based on a set of translation functions, will be presented. At the end, a set of experiments will be discussed, trying to evaluate how versatile the proposed approach is.(undefined

Universidade do Minho: RepositoriUM

Dagstuhl Research Online Publication Server

EBMT: Example Based Machine Translation

Author: Simões Alberto
Publication venue
Publication date: 01/01/2005
Field of study

Repositório Comum

Examples Extraction for Machine Translation

Author: Simões Alberto
Publication venue
Publication date
Field of study

This presentation will focus on some techniques for the extraction of bilingual resources for machine translation, giving some emphasis to the extraction of translation examples. It will include a brief experiment on the usage of these resources for hybrid machine translation

Repositório Comum

How social media and DIY culture contribute to democracy, communities and the creative economy

Author: Simões José Alberto
Publication venue: 'Informa UK Limited'
Publication date: 01/12/2016
Field of study

UID/SOC/04647/2013authorsversionpublishe

Repositório da Universidade Nova de Lisboa

Automatic Extraction of Translation Resources from Parallel Corpora

Author: Simões Alberto Manuel
Publication venue
Publication date: 08/10/2009
Field of study

Repositório Comum

Portuguese-English word alignment: Some experiments

Author: Santos Diana
Simões Alberto
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2008
Field of study

In this paper we describe some studies of Portuguese-English word alignment, focusing on (i) measuring the importance of the coupling between dictionaries and corpus; (ii) assessing the relevance of using syntactic information (POS and lemma) or just word forms, and (iii) taking into account the direction of translation. We first provide some motivation for the studies, as well as insist in separating type from token alignment. We then briefly describe the resources employed: the EuroParl and COMPARA corpora, and the alignment tools, NATools, introducing some measures to evaluate the two kinds of dictionaries obtained. We then present the results of several experiments, comparing sizes, overlap, translation fertility and alignment density of the several bilingual resources built. We also describe preliminary data as far as quality of the resulting dictionaries or alignment results is concernedThis work was done in the scope of the Linguateca project, contract no. 339/1.3/C/NAC, jointly funded by the Portuguese government and the European Union. We thank Jose Joao Dias de Almeida for relevant comments during the development of these tools

CiteSeerX

Universidade do Minho: RepositoriUM